SINGLE-CROSSOVER DYNAMICS: 
FINITE VERSUS INFINITE POPULATIONS 



ELLEN BAAKE AND INKE HILDEBRANDT 

Abstract. Populations evolving under the joint influence of recombination and resampling 
(traditionally known as genetic drift) are investigated. First, we summarise and adapt a 
deterministic approach, as valid for infinite populations, which assumes continuous time and 
single crossover events. The corresponding nonlinear system of differential equations permits 
a closed solution, both in terms of the type frequencies and via linkage disequilibria of all 
orders. To include stochastic effects, we then consider the corresponding finite-population 
model, the Moran model with single crossovers, and examine it both analytically and by 
means of simulations. Particular emphasis is on the connection with the deterministic so- 
lution. If there is only recombination and every pair of recombined offspring replaces their 
pair of parents (i.e., there is no resampling), then the expected type frequencies in the finite 
population, of arbitrary size, equal the type frequencies in the infinite population. If resam- 
pling is included, the stochastic process converges, in the infinite-population limit, to the 
deterministic dynamics, which turns out to be a good approximation already for populations 
of moderate size. 



1. Introduction 

It is well known that the dynamics of populations under recombination is notoriously diffi- 
cult to treat, even if the population is so large that stochastic fluctuations due to genetic drift 
can be neglected, so that the time evolution may be described by a deterministic dynamical 
system. The difficulty lies in the inherent nonlinearity of the corresponding (difference or dif- 
ferential) equations, which stems from the interaction of pairs of parental individuals during 
recombination. 

The overwhelming part of the literature on the dynamics of recombination deals with dis- 
crete time. The first solutions go back to Geiringer (1944, [15] ) and Bennett (1954, |8J); their 
construction was more recently worked out in greater detail and completeness by Dawson 
|1 1 J . The underlying mathematical structures were investigated within the framework of ge- 
netic algebras, see [IBJCEH], or [23]. Quite generally, the solution relies on a certain nonlinear 
transformation (known as Haldane linearisation) from (gamete or haplotype) frequencies to 
suitable linkage disequilibria, which decouple from each other and decay geometrically. But if 
more than three loci are involved, this transformation must be constructed via recursions that 
involve the parameters of the recombination process, and is not available explicitly except in 
the trivial case of free recombination (i.e., independent gene loci). A different approach was 
taken by Barton and Turelli [7] by translating type frequencies into certain sets of moments; 
the resulting iterations are well-suited for symbolic manipulation, numerical treatment, and 
biological interpretation, but do not lead to (and do not primarily aim at) explicit solutions. 
For a review of the area, see [H Ch. V.4]. 

It has turned out recently [U EJ [5J that, in contrast to the discrete-time situation, the 
corresponding dynamics in continuous time permits a simple explicit solution in a biologically 
relevant special case, namely, the situation in which at most one crossover happens at any 
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given time. Here, only recombination events may occur that partition the sites of a sequence 
(or the loci on a chromosome) into two ordered subsets that correspond to the sites before 
and after a given crossover point. In contrast to previous approaches, the solution is given 
directly in terms of the original type frequencies; but, again, a transformation to certain linkage 
disequilibria is available that linearise and diagonalise the system. The main simplification lies 
in the fact that the transformation is independent of the recombination rates and is available 
in a simple explicit form. 

However, the focus in modern population genetics is on finite populations, which also ex- 
perience genetic drift (that is, resampling). In order to investigate the relevance of the de- 
terministic solution for the more involved stochastic system, we shall, in this paper, examine 
the corresponding finite population model, namely, the Moran model with recombination, and 
compare some of its properties with the infinite population model, both analytically and by 
means of simulations. We shall show the following: 

• In a finite population with recombination, but without resampling (i.e., every pair of 
recombined offspring replaces its pair of parents), the type frequencies are, in expec- 
tation, given by the corresponding quantities for infinite populations. This property 
holds exactly as long as the genetic material is conserved in the recombination process, 
and to a very good approximation otherwise. This connection between stochastic and 
deterministic models is by no means obvious: It is usually reserved for populations of 
individuals that evolve independently (like branching processes); or to systems with in- 
teractions that do not change the expected type frequencies (such as the Moran model 
with independent mutation and resampling). In contrast, interaction due to recombi- 
nation does change the expected composition of the population, and does, therefore, 
not appear to be of this simple form. 

• If the joint action of recombination and resampling is considered, the process of type 
frequencies differs from the deterministic dynamics, even in expectation, particularly 
if the population is small. It follows from general arguments, however, that the former 
converges to the latter when population size goes to infinity. By means of simulations, 
we show that this limit yields a good approximation of realistic biological situations: 
Convergence is fast enough to justify the use of the deterministic solution already for 
populations of moderate size (of the order of 10 5 individuals, say). 

The paper is organised as follows. We first recapitulate the single-crossover differential 
equation and its solution. This is appropriate since the original articles [H [6] use a rather 
abstract measure-theoretical framework suitable for very general type spaces, which is not 
easily accessible to theoretical biologists. We reduce the formalism to the important special 
case of finite type spaces, which is free from technical subtleties and permits an explicit 
representation in terms of probability vectors as well as illustration by means of concrete 
examples. We then change gears and explore the Moran model with recombination. With 
the help of standard techniques for Markov chains in continuous time, we develop the results 
stated above. 

2. The deterministic approach 

2.1. The model and its recombinator formulation. We start by extracting from j5] and 
[6 J the continuous-time dynamics of single crossovers in an infinite population. Let us consider 
genes or chromosomes as linear arrangements of sites, indexed by the set S := {0, 1, . . . ,ra}; 
sites may be interpreted as either nucleotide positions in a stretch of DNA, or gene loci on a 
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chromosome. For each site % £ S, there is a set Xj of "letters" (to be interpreted as nucleotides 
or alleles, respectively) that may possibly occur at that site. To allow for a convenient notation, 
we restrict ourselves to the simple but important case of finite sets Xf, for the full generality 
of arbitrary locally compact Xj, the reader is referred to the original articles. 

A type is thus defined as a sequence x = (x ,x 1 , . . . ,x n ) G Xq x X\ x • • • x X n =: X, 
where X shall be called the type space. By construction, x i is the i-th coordinate of x, and 
we define Xj := (xA^j as the collection of coordinates with indices in /, where I is a subset 
of S. Types may be understood as alleles (if sites are nucleotide positions) or haplotypes (if 
sites are gene loci). We shall think at the haploid level, speak of gametes as individuals, and 
describe a population as a distribution of (absolute) frequencies (or a non-negative measure) 
uj on X. Namely, uj({x}) denotes the frequency of type x G X, and uj(A) := YlxeA UJ ({ x }) f° r 
A C X; we abbreviate uj({x}) as u;(x). The set of all frequency distributions on X is denoted 
by M.. If we define 5 X as the point measure on x (i.e., S x (y) = 8 XjV for x,y € X), we can 

I v-i 

also write uj = YlxeX w(x)8 x . We may, alternatively, interpret 5 X as the basis vector of M> 
that corresponds to x (where a suitable ordering of types is implied, and \X\ is the number of 
elements in X); uj is thus identified with a vector in M> . 

At this stage, frequencies need not be normalised; u(x) may simply be thought of as the 
size of the subpopulation of type x, measured in units so large that it may be considered 
a continuous quantity. The corresponding normalised version p := cj/||u;|| (where ||u;|| := 
^^^(x) = uj(X) is the total population size) is then a probability distribution on X, and 
may be identified with a probability vector. 

Recombination acts on the links between the sites; the links are collected into the set 
L := {±, §,..., We shall use Latin indices for the sites and Greek indices for the links, 

and the implicit rule will always be that a = Si! is the link between sites i and i + 1; see 
Figure [U 
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FIGURE 1. Sites and links. 



We shall make the simplifying assumption that only single crossovers occur at a time (which 
is realistic if linkage is tight, i.e., if the sites belong to the DNA sequence of, say, less than a 
megabase, or represent a few adjacent loci). More precisely, for every a £ L, every individual 
exchanges, at rate g a /2, the sites after link a with those of a randomly chosen partner. 
Explicitly, if the 'active' and the partner individual are of type x and y, then the new pair has 
types (x ,x 1 ,...,x lai ,y^ aV ...,y n ) and (y , y v . . . , , , . . . , x n ), where LaJ(r«l) is th e 
largest integer below a (the smallest above a); see Fig. [2l Since every individual can occur as 
either the 'active' individual or as its randomly chosen partner, we have a total rate of g a for 
crossovers at link a. For later use, we also define g := X^aeL Qa- The dynamics of the type 
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frequencies is then given by the system of differential equations 

(1) U t (x) = y^a(]j 1| W t (x ,..., Xi i, *,..., *)uj t (*,... ,*,Xr a] ,... ,X n ) -Wt(x)) 

for all x G X, where a "*" at site i stands for Xj and denotes marginalisation over the 
letters at site i (i.e., c^(x , . . . , x, *) is the frequency of individuals which have letters 

x , . . . , x^j at the sites before a, and arbitrary letters from X|- Q -| , • • • , X n at the sites after a). 

Note that the assumption of a reciprocal exchange (between two parents that combine 
into two offspring under conservation of the genetic material) is an 'effective' description that 
results from taking the symmetry of recombination into account. A more detailed model starts 
out from two parents combining into one offspring; but the symmetry of the process again 
leads to Eq. [1] and is thus equivalent to our picture, see [9j Ch. II. 2.1]. In contrast to the 
deterministic situation, however, these details do play a role in finite populations; this will be 
taken up in Sec. O 




FIGURE 2. Upper panel: Recombination between individuals of type x and y. Lower 
panel: The corresponding 'marginalised' version that summarises all events by which 
individuals of type x are gained or lost. Note that, in either case, the process can go 
both ways, as indicated by the arrows. 

We have retained the non-normalised version for the frequencies here for easier comparison 
with the stochastic model (the stochastic process will come naturally as a family of integer- 
valued random variables, which sum up to the population size N). Of course, the more familiar 
normalised version emerges if wt in ([TJ) is replaced by pt := uJt/\\oJt ||; the normalising factor on 
the right-hand side is then, of course, unity. 

Note that the model ([T]) implies Hardy- Weinberg proportions (i.e., frequencies of diplotypes 
are given by independent combination of the corresponding haplotypes at all times). In con- 
tinuous time, this only applies exactly if the duration of the diplophase is negligible. However, 
it is a good approximation if recombination is rare at the scale of an individual's lifetime. 

An important ingredient to the solution of the large, nonlinear system of differential equa- 
tions ([T]) lies in its reformulation in terms of recombination operators. Let us introduce the 
projection operators n i , i 6 S, via 

, 2 \ '■ x o x Ai x • • • x X n -> Xi 

\Xq, X 1 , . . . , X n ) I— > Xj, 

i.e., 7T i is the canonical projection to the i-th coordinate. Likewise, for any index set I C S, 
one defines a projector ttj : X — > Xj := X ie jXi via (x , x l5 . . . , x n ) i— > (x i ) ieI =: Xj. We 
shall frequently use the abbreviations ir <a := i i, and ir >a := 7T/r a -i n \, as well as 

x <a := 7r <Q (x), x >a := 7r >Q (x). The projectors ir <a and ir >a may be thought of as cut and 
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forget operators because they take the leading or trailing segment of a sequence x, and forget 
about the rest. 

Whereas the ttj act on the types, we also need the induced mapping at the level of the 
population, namely, 

(3) 7ij. : M ► M 

UJ I — > TTj.LO := UJ O IX j , 

where ttJ 1 denotes the preimage under ttj. The operation . (where the dot is on the line and 
should not be confused with a multiplication sign) is known as the "pullback" of ttj w.r.t. u; 
in terms of coordinates, the definition may be spelled out as 

(lTj.Uj)(xj) = UJ O 7T^ 1 (2; J ) = uj({x € X I TTj(x) = Xj}), Xj G Xj. 

So, txj.io is nothing but the marginal distribution of oj with respect to the sites in /. In 
particular, (7r <a .u))(x <a ) = u>(x ,x l , . . . i%\ a \ ?*)•••>*) is the marginal frequency of sequences 
prescribed at the sites before a, and vice versa for the sites after a. Note that, although we 
have defined TTj. as acting on nonnegative measures (or vectors) only, this linear operator may 
be extended to the set of all measures. 

Now, recombination (at the level of the population) means the relinking of a randomly 
chosen leading segment with a randomly chosen trailing segment. We therefore introduce 
(elementary) recombination operators (or recombinators, for short), R a : M. — > M for a € L, 
defined by 

(4) R a {uj) := rr— ((7T <a .Uj) ® (7T >a .w)) . 

IMI 

Here, the tensor product reflects the independent combination (i.e., the product measure) of 
the two marginals vr <a .w and TT >a .uj. R a is therefore a cut and relink operator. R a (u) may be 
understood as the population that emerges if all individuals of the population u disintegrate 
into their leading and trailing segments, and these are relinked randomly; the sites before a 
are then in linkage equilibrium with respect to those after a. Note that ||i? a (u;)|| = 
In terms of coordinates, Eq. (J4j) reads 

1 

(5) (a?) = ,,— t Mxq, x 1: . . . , x la I ,*,... , *)u(*, ...,*, x M , ... , x n ). 

\\UJ\\ 1 J 11 

The recombination dynamics ([Q) may thus be compactly rewritten as 

(6) w t = Y, e a {Ra(ut) - <*) = Yl e a {R a - 1)0*) =: *(w t ) s 

where 1 is the identity operator. 

2.2. Solution via recombinators. The solution of j6|) relies on some elementary properties 
of our recombinators. Most importantly, they are idempotents and commute with each other, 
i.e., 

(7) R 2 a = R a , a G L, 

(8) R a Rf3 = RpRa, a,/3eL. 

These properties are intuitively very plausible: if linkage equilibrium has already been estab- 
lished at link a, then further recombination at that link does not change the situation; and if 
a product measure is formed with respect to two links a and (5, the result does not depend 
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on the order in which the links are affected. For the proof, we refer to [HI Prop. 2]; let us only 
mention here that it relies on the elementary fact that, for u G M, 

7r <a .[Rp(uj)) = ir <a .uj for (3 > a, and 
TT >a \Rp(u))) = ir >a .uj for (3 < a; 

that is, recombination at or after a does not affect the marginal frequencies at sites before a, 
and vice versa. 

We now define composite recombinators as 

R G ■= Y[ R a for GcL. 

a&G 

Here, the product is to be read as composition; it is, indeed, a matrix product if the recom- 
binators are written in their matrix representation, which is available in the case of finite 
types considered here (see [3|). By property ((SJ), the order in the composition plays no role. 
Furthermore, (jTj) and (|8j) obviously entail RqRh = Rguh for G,H C L. 

The solution of the single-crossover dynamics J6]) may now be given in closed form as 

(9) io t = ^ og(*)-Rg(^o) =: <Pt(vo) 

GCL 

with coefficients 

aa(t)= I] e-e^Hil-e-Oe*), 

aeL\G /3eG 

and initial value ujq; i.e., (ft is the semigroup belonging to the recombination equation (|6j). For 
the proof, the reader is referred to [6l Thm. 2], or [H Thm. 3] (the former article contains the 
original, the latter a shorter and more elegant version of the proof). Let us only note here that 
the coefficients ac(t) have the following intuitive explanation. In a given individual, consider 
the set of links that has, until t, been hit by at least one crossover event. The probability that 
this set is G equals the probability that all links in G have already been hit, while those in 
the complement of G have not; by independence across links, this probability is just ac(t). 

We would like to note, however, that this plausible argument should not be taken too far: 
For example, an analogous solution, although suggestive, does not apply to the corresponding 
single-crossover model in discrete time (its solution is, in fact, much more difficult, see the 
discussion in |6J). Indeed, explicit solutions to large, nonlinear systems are rare - explicit 
semigroups are usually available for linear systems at best. For this reason, the recombination 
equation and its solution have already been taken up in the framework of functional analysis, 
where they have led to an extension of potential theory |22j . In the next paragraph, we will 
meet some further clues to an underlying linear structure that lurks behind the solution. 

2.3. Linkage disequilibria. The approach described in the previous Section is somewhat 
unconventional from a population genetics perspective, in that it solves the recombination 
dynamics at the level of the type frequencies. As mentioned in the Introduction, however, one 
usually works with transformed quantities, like moments, cumulants, or linkage disequilibria 
(LDEs). Indeed, certain linkage disequilibria are intimately connected with the solution just 
presented, and provide the key to an underlying linearity, and hence simplicity, as we shall 
now see. Following [6], let us define what we shall call LDE operators via 

(10) T G := ^ (-1)™^, GCL, 

hdg 
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where the underdot indicates the summation variable. Eq. (fTOl) leads to the inverse Rq = 
^Ih^G^H ^ Mobius inversion, a versatile tool from combinatorial theory, see [H Thm. 4.18]. 
It was shown in [6] that, if cut is the solution j9]), the transformed quantities Ta(ut) satisfy 

(11) ^T G (ut) = - e a ) ?bM, Gcl, 



which is a system of decoupled, linear, homogeneous differential equations, with solution 
Tc(uJt) = exp(— J2aeL\G Qo)Tg(uo)- Note that this simple form emerged through the nonlin- 
ear transform (llOp as applied to the solution of the coupled, nonlinear differential equation 
§}. 

Obviously, the Ta(u>t) provide candidates for the definition of linkage disequilibria which de- 
couple, and decay exponentially (unless, of course, G = L, which corresponds to the stationary 
state 

1 n 

(12) ^oo = T L (UJ ) = _ 1 ^(jTi-Uo), 

" " 1=1 

the population in linkage equilibrium with respect to all links). All that remains to be done 
is to identify a set of suitable components to work with (using all components of Tc(LOt), for 
all G C L, would imply a lot of redundancy). 

To this end, let us introduce the following shorthand notation. Let (ji, . . . ,jk), with j\ < 
' ' ' < 3k, symbolically denote a so-called cylinder set in X that is specified at sites ji £ S, for 
1 < i < k. More precisely, these are sets of the form 



(13) (ji, . . . ,j k ) = X x • • • x X h -i x {x h } x [. . . ] x {x jk } x X jk+1 x • • • x X, 



L Jt! 



where [. . . ] contains factors {x { } or X{ depending on whether i appears in the set . . . , jk} 
or not. That is, with the cylinder set (ji, . . . ,jk), we mean the set of types that have prescribed 
letters Xj i , . . . ,x Jfe at sites j%, . . . ,jk, and arbitrary letters at all other sites. Note that the 
shorthand (ji,---,jk) is symbolic in that it does not specify the letters explicitly (the x^. 
appear on the right-hand side of (fl~3l) . but not on the left). Note also that the cylinder sets 
formalise our previous '*' notation: tut((jl, ■ ■ ■ ,jk}) is the marginal frequency of the types 
prescribed at sites ji, - ■ ■ ,jk, and (pQ) may be rewritten as 

«t«s» = E *«( A^ ' • • • ' L«j»«»«r«i, • • • > »» - 

Let us now define linkage disequilibria (or correlation functions) of order A; via 

(14) d t ({ji,...,j k )) := (T {Q<il}u{a>jfc} (pi)) (0'i, - - - ,ife)) - 

That is, for a given cylinder set, LDEs emerge by applying, to pt = ut/\\uJt\\, the LDE operator 
defined by the links before the first and after the last element of that cylinder set, and evalu- 
ating the resulting quantity at the cylinder set. Clearly, dt ({ji, ■ ■ ■ ,3k)) contains products of 
at most k marginal frequencies. Note that we have defined linkage disequilibria on the basis 
of the normalised quantities pt for the sake of compatibility, and comparability, with related 
quantities in population genetics. 
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By (fTTjl we know that, for every cylinder set ■ ■ ■ ,jk), 
(15) —d t ((j 1 ,...,j k )) = -( 0a)d t ((ji,---,jk)); 

jl<a<j k 

we thus have linkage disequilibria of all orders that decouple and decay exponentially. It was 
shown in (6j p. 25] that the collection of d t ({ji, . . . , jfc)), for all index sets . . . ,jk} C S 
and all choices of letters x^ 6 Xj i} 1 < i < k, is complete in that it uniquely determines pt- 

An example. Consider S = {0,1,2,3}. The highest (fourth-order) LDE is, on the basis of 
([TO]) and (33)), given by 

^((0,1,2,3)) =(T (^)) ((0,1,2,3)) = (Rnipt)) ((0,1, 2, 3)) 

HcL 

= [0, 1, 2, 3] - [0] [1, 2, 3] - [0, 1] [2, 3] - [0, 1, 2] [3] 
+ [0][l][2,3] + [0][l,2][3] + [0, 1][2][3] 
-[0][1][2][3], 

where we have now used the shorthand [ji,---,jk] '■= Pt{{jl, ■ ■ • ,3k))- By (fl5)) . we have 
dt«0, 1, 2, 3)) = -(q 1/2 + q 3/2 + g 5/2 )d t ((0, 1, 2, 3}). 
The third-, second-, and first-order LDEs are given by 

<k{{h,32,h)) = b'l, 32, js] ~ [ji] b'2, h] - \j1J2] [73] + [ji] b'2] b'3], 
dt({ji,32)) = {ji,k} ~ b'l] b'2], 

rft(0i» = bi], 

where the latter correspond to the letter frequencies, as usual. 

Obviously, the d\ are correlation functions that measure the dependence between sites, at 
the various orders. It is important to note that they agree with other measures of LDE (see [9j 
Ch. V.4.2] for an overview) only up to order two. From order three onwards, they are special in 
that terms like [0, 2] [1] or [0, 3] [1, 2] are absent. This is due to the fact that our d\ are based on 
ordered partitions of 5, and thus only contain terms 'produced' by composite recombinators; 
they are therefore adapted to single-crossover dynamics. In contrast, conventional LDEs or 
central moments are based on unordered partitions, which cannot be produced by composite 
recombinators. 

3. The stochastic model and its simulation 

The finite population counterpart of our deterministic model is the Moran model with 
single-crossover recombination. To simplify matters (and in order to clearly dissect the indi- 
vidual effects of recombination and resampling), we shall assume that resampling (traditionally 
referred to as genetic drift) and recombination occur independently of each other. More pre- 
cisely, we assume a finite population of fixed size N, in which every individual experiences, 
independently of the others, 

• resampling at rate 6/2. The individual reproduces, the offspring inherits the parent's 
type and replaces a randomly chosen individual (possibly its own parent). 

• recombination at (overall) rate g a at link a E L. Every individual picks a random 
partner (maybe itself) at rate g a /2, and the pair exchanges the sites after link a. 
That is, if the recombining individuals have types x and y, they are replaced by 
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the two offspring individuals (x <a ,y >a ) and (y< a ,#> a ), as in the deterministic case, 
and Fig. [2l That is, the genetic material is conserved, and any resampling effect is 
excluded. As before, the per-capita rate of recombination at link a is then g a , because 
both orderings of the individuals lead to the same event. To avoid degeneracies, we 
shall assume throughout that g a > for all a G L. 

As in the deterministic model, the factor 1/2 arises every time ordered pairs (of an active 
individual and a randomly chosen partner) are considered (it is the same factor that also occurs 
when the Moran and Wright-Fisher models are compared p2, P- 23]). Note that the randomly 
chosen second individual (for resampling or recombination) may be the active individual itself; 
then, effectively, nothing happens. One might, for biological reasons, prefer to exclude these 
events by sampling from the remaining population only; but this means nothing but a change 
in time scale of order 1/N. 

To formalise this verbal description of the process, let the state of the population at time 
t be given by the collection Z t = (Z t {x)) xeX G E := {z G {0, 1, iV} |x| | Y, x z{x) = N}, 
where Z t (x) is the number of individuals of type x at time t; clearly, Ylx&x %t{%) = N. We 
also use Zt in the sense of a (random counting) measure, in analogy with ujt (but keep in 
mind that Z t is integer-valued and counts single individuals, whereas uit denotes continuous 
frequencies in an infinite population). The letter z will be used to denote realisations of Z t — 
but note that the symbols x, y, and z are not on equal footing (x and y will continue to be 
types). The stochastic process {Z t }t>o is the continuous-time Markov chain on E defined as 
follows. If the current state is Z t = z, two types of transitions may occur: 



(where 6 X is the point measure on x, as before). Note that, in (jTHJ) and (fl7|) . transitions that 
leave E are automatically excluded by the fact that the corresponding rates vanish. 

Note that 'empty transitions' (s(x,y) = or r(x,y,a) = 0) are explicitly included (they 
occur if x = y in resampling or recombination, and if x <a = y <a or x >a = y >a in recom- 
bination). This is convenient since the total reproduction and recombination rates in the 
population (as based on active individuals) are then given by bN/2 and gN/2, respectively, 
independently of z, a fact that comes in handy in the simulations. Since we only require Zt 
at (equidistant) epochs U = iAt, i = 0, 1, 2, . . . , no waiting times for individual events need 
be generated; rather, we may draw the total number of events in a time interval At from 
the Poisson distribution with parameter N(b + g)At/2. The nature of each event is then 
determined in the obvious way (the active individual is of type x with probability z(x)/N; 
the second individual (chosen randomly for resampling or recombination) is of type y with 
probability z(y)/N; the pair performs resampling with probability b/(b + g), or recombination 
with probability g/(b + g); if recombination occurs, then at link a with probability g a /g). 

For simulations, the description of the process through (jT6j) and (fT7l) is perfectly adequate. 
For the theoretical analysis, however, one also needs the rates Q(z, z + v) for the transitions 
z — ► z + v for given z £ E, v £ E — z (where E — z := {v \ z + v £ E}). Here, one must take 



(17) 



(16) 



recombination 



resampling: 



z^z + s(x,y), s(x,y) := 5 X - S y , 

at rate -^-^bz(x)z(y) for (x,y) G X x X 

z — » z + r(x,y,a), 

r(x, y, a) := S {x<a , y>a ) + S( y<a ,x >ol ) - S x - 5 V , 
at rate — — : g a z(x)z(y) for (x, y) G X x X, a G L 
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care of the fact that distinct combinations of x, y, and a may lead to the same 'net' transition. 
Clearly, Q(z, z + v) = (z, z + v ) + (z, z + v ) , where 

(18) Q« (*,* + «) := A Yl < X )<VI 

(x,y)eXxX: 
s(x,y)=v 

(19) QW{z,z + v) := ± Yl Qaz(x)z(y). 

(x,y)eXxX, aeL: 
r(x,y,a)=v 

Of course, the empty sum corresponds to impossible transitions and is understood as 0. Note 
that the sum in (fT8j) contains more than one term only in the case v = 0; however, (fT9j) is a 
'true' sum (of at least two terms) whenever v is of the form r(x, y, a) for some x, y, a. 

Note that, with the rates for 'empty transitions' as ascribed to the diagonal elements Q(z, z), 
the collection (Q(z, z')) z , z 'eE does nof form a Markov generator; but the corresponding gener- 
ator Q = (Q(z, z')) z>z '£E is obtained by defining Q(z,z') = Q(z,z'), for z' ^ z, together with 

This rather informal definition of the Markov chain in terms of its transitions and their 
rates is all that is required for this article - after all, we are in the well-behaved situation of a 
finite state space. Excellent overviews of many aspects of Markov chains in continuous time 
(including the formal details) can be found in [20l Ch. 2] or [2J Ch. 1.8]. 

In contrast to the situation in infinite populations, it now matters whether one or two 
offspring are created at a time. For comparison, we shall, therefore, also consider the alternate 
(biologically more realistic) recombination scheme in which an individual recombines with a 
random partner, but only one of the offspring is kept (chosen randomly), and replaces one of 
its parents (again chosen randomly). That is, instead of (jTTJ) , we now have four recombination 
transitions 

(20) z ^ z - 6 X + 6 {x<a>y>a) , z -)• z - S y + S {y<a!X>J , 

( 21 ) z ^ z - 6 x + 6 (y <a ,*> a )> z ^ z - 5 y + 6 (x <a ,y> a )> 
each at rate 

(22) ■^g a z(x)z(y), for (x, y) G X x X, a G L. 

Note that, in contrast to (jTTJ) , the symmetry between x and y is broken in single transitions. 
Furthermore, only half as many replacements happen here; the corresponding deterministic 
dynamics is, therefore, given by J6]) with g a replaced by g a /2 for all a G L. More importantly, 
the recombination scheme (f20l) . ([2~T1) no longer conserves the genetic material (since net re- 
placements (e.g. of x >a by y >a ) take place), which introduces a (minor) resampling effect not 
present in (PT7l) . If not stated otherwise, however, we will stick to (PT7l) for our recombination 
transitions. 

4. Connections between stochastic and deterministic models 

Let us now explore the connection between the stochastic process {Z t }t>o on E, its nor- 
malised version {Z t }t>o = {%t}t>o/N on E/N, and the solution Ut = ¥>t(wo) of the differential 
equation ((H). We shall first consider the case without resampling (i.e., 6 = 0) and show that 
E(Zt) = (ft(Zo) (and, consequently, E(Zt) = (pt(Zo)), for arbitrary N. For b > 0, still with 
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finite N, this is no longer true; however, Z t converges to (pt(po) for N — > oo, provided Zq 
converges to po. We shall make this convergence explicit, and investigate it by means of 
simulations. 

4.1. General properties of the Moran model with recombination. For lack of reference 
and the sake of completeness, we start by making explicit a plausible - and often implicitly- 
used - fact concerning the evolution of the expectation in finite-state continuous-time Markov 
chains. 

Fact 1. Let {Zt}t>o be a continuous-time Markov chain on a finite subset E ofL d , as defined 
by its transitions from state z to state z + v (z, z + v G E) taking place at rates Q(z, z + v). 
For all t > 0, the expectation of Zt then satisfies the equation 

(23) ±E(Z t )=E(F(Z t )), 
where, for z G E, 

(24) F(z):= vQ(z,z + v). 

v&E-z 

Before turning to the proof, let us note that F(z) may be interpreted as the 'mean rate- 
of-change vector' of the chain in state z. A little later, we shall turn (|23j) into a differential 
equation for E(Zt). 

Proof. Let us first recall the elementary fact that, for the Markov chain starting in Zq = s, we 
have P(Z t = z) = Pt(s,z), where we have suppressed dependence on s on the left-hand side, 

and Pt = (Pt(s, z)) s ,z£E = e"^ is the Markov semigroup corresponding to Q, the generator 
of the chain (obtained from the collection of transition rates (Q(z, z + v)) Z)Z+v ^e, see above). 
Therefore, E(Z t ) = Y^zeE zP t{s, z), and, since f t P(t) = P(t)Q, 

|l(Z t ) = Y. z J t P ^ z ')= E z'P t (s,z)Q{z,z') 
z'eE z,z'eE 

= J2 (z'-z)P t (s,z)Q(z,z') = J2 E vP t (s,z)Q(z,z + v) 

z,z'eE z<=E v€E-z 

= E( Y, vQ(Z u Z t + v))=E(F(Zt)), 

v&E-Zt 

where the third equality sign is due to the fact that Q is a Markov generator. □ 

After this digression, let us return to the Moran model (with recombination and resampling), 
and show that F = <£> on E, where <P continues to be the right-hand side of ((6|). 

To this end, recall that Z t is a (counting) measure, to which our projection operators may 
be applied in the usual way. In agreement with j3|), we write itj.Zt = ZfOTrJ 1 , i.e., 7Tj.Z t is the 
marginal of Z t w.r.t. the sites in /; and, likewise, for a realisation z of Z t . Again, we also use 
shorthands such as Zt(x <a ,*) to denote marginal frequencies (in this case, (ir <a .Zt)(x <a )). 
Furthermore, we set Ej := -Kj.E for I C S. 

To calculate F, let us write F = F^ + F^ r \ where F^ and F^ take care of transitions 
due to resampling and recombination, respectively. By symmetry of the transition rates 
for resampling, F^ = (type frequencies are martingales under resampling alone); we can 
therefore restrict ourselves to the recombination transitions defined by Q' r '(z,z + v) of (fT9l) . 
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Summing, as in (Ti~9l) . over all possibilities for the gain or loss of a single x-individual (cf. Fig. [2J 
lower panel), one obtains for the x-component of F 



F x (z) = F^(z) = v(x)Q^(z,z + v) 

vGE—z: 

(25) v(x)=±1 

= Yl J^e a {z{x <a , *)*(*, x >Q ) - z(x <a ,x >a )z(*, *)) , 



where, of course, z(x <a ,x >a ) = z(x), and z(*,*) = N. One immediately recognises the 
familiar structure of (Q]). We may therefore conclude: 

Fact 2. In the Moran model with resampling and recombination transitions according to (TTo]) 
and p7]l . we have F = <P on E, with <P of ©. 

4.2. Recombination without resampling: expectations for finite N. Let us now return 
to Eq. (j23j) . Note that, in general, it does not lead to a "closed" differential equation for 
E(Zt), because it is not clear whether K(F(Z t )) can be written as a function of E(Zt) alone. 
Clearly, K(F(Z t )) = F(E(Z t )) if F is linear (or affine), as, for example, in Markov branching 
processes, or in the Moran model with mutation, but without recombination. But for nonlinear 
F, this tends to be violated, as nicely illustrated in [TU Ch. 1.4] for the case of the Ricker 
model in ecology. If F is nonlinear but polynomial (the usual case in population genetics 
or chemical reaction systems, for example), (|23|) can still serve as a starting point for an 
expansion involving a hierarchy of moments, which can be closed by suitable approximation 
methods (like, for example, the quasi-linkage equilibrium approach in [7]). 

Our aim in this paragraph is to show that recombination without resampling is another 
(and apparently new) exception: despite its nonlinearity, ^ satisfies K(0(Zf)) = <P(K(Z t )), 
provided the marginals of Zq are independent, which always applies if Zq is fixed. This will 
require a lemma concerning the independence of marginal processes. For I C 5, we introduce 
the 'stretch' 

J{I) := {i G S | min(J) < i < max(I)}, 

and look at the projection of the recombination process on non-overlapping stretches. This is 
the content of the following lemma. 

Lemma 1. Let {Zt}t>o be the recombination process without resampling as defined by the 
transition rates (flT)) . Let A, B C S with J(A)r\J(B) = 0. Then, {-K A .Z t }t>o and {-K B .Z t }t>o 
are conditionally (on Zq) independent Markov chains on Ea and Eb- 

Proof. Clearly, for any given I d S, {irj.Z t }t>o is a stochastic process on Ej. Let us first 
show that TT A .Z t and ir B .Z t are individually Markov chains, and then establish that they are 
(conditionally) independent. For the first step, observe that, when Z t performs the transition 
z — > z + v (on E), with v = $(x <a ,y >a ) + ^(y <a ,x >a ) ~ $x — $y for some a £ L and x,y G X, 
then itj.Zt goes from tTj.z — > tTj.(z + v) on Ej, where TTj.v = ^-w I {x <a ,y >a ) + ^ 1 {y <a ,x >a ) ~ 
^7r (x) ~ fi-ir (y)- The rate for a corresponding 'projected transition' Vj € Ej — zi \zj G Ej) is 
then given by summing all rates of the original process that lead to the transition v T under 
the projection. That is, with r T (xT,y r ,a) := 5i x v ) + 5r v x \ — 5 X — 5 V and the 
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shorthand I <a := I fl {0, 1, . . . , [a\} (and likewise for I >a ), one gets 

£ Q W (v + «) = ^ E ^((7r J ^)(^))((7r J .^)(y / )), 

v.TTj.v^j Xj,yjGXj,aeL: 

r I {x I ,y I ,a)=v I 

for z & E. Since these rates only depend on ttj.z (rather than z itself), they define a col- 
lection of rates Qy (z r , z 1 + Vj) {z l G Ej,Vj <E Ej - zj) so that Y^v.n .v=v Q^ r \z,z + v) = 
Q^P^j.z^j.^z + v)) for all z £ E. Therefore, the marginalised process {7r 7 .Z 4 }i>o is a Markov 

chain on Ej with transitions z 1 — > z 1 +Vj at rates Qj(zj, Zj + Vj). (This is an example of the 
so-called lumping procedure for Markov chains, compare jFIU] and [17] for the general case, or 
[4 J for the sequence context considered here). Our processes {tt A .Z t }t>o and {-K B .Z t }t>o are 
therefore Markov on Ea and Eb, respectively. 

For the second step, we note that, for given a and /, a net transition in 7r 7 .Z t (i.e., ttj.v ^ 0) 
requires [a\, \a] £ J(I). Since J(A) fl J(B) = by assumption, n A .v ^ implies tt b -v = 
and vice versa. A transition in {ir A .Zt}t>o therefore leaves {-K B .Zt\t>o unchanged, and vice 
versa. The joint process Wt '■= {^A-^t^B-^t}^ therefore has generator Qa,b = Qa <8> 1b + 
1a <8> Qb, where Qa is the generator of {ir A -Z t }t>o, Qb the generator of {n B .Z t }t>o, and 1a 
and 1b denote the identity on Ea and Eb, respectively. Hence, the marginal processes are 
conditionally independent, and the claim follows. □ 

Remark 1. Although we have, for ease of notation, formulated the above result for two sub- 
sets A and B only, the proof obviously goes through for any collection Ai, ... , Ak C S, if the 
J(Afc), 1 < k < K, are pairwise disjoint. We then obtain that {tt Ai .Z t }t>o, • • • > \^ A K -^t}t>o 
are conditionally independent Markov chains. If, furthermore, ir Ai .Zo,...,ir A .Zo are in- 
dependent, conditional independence of the marginal processes turns into independence. In 
particular, this is the case if Zq is fixed (since F(Zq G •) is then a point measure on some 
s€E). 

As an immediate consequence, we now arrive at 

Theorem 1. Let {Zt}t>o be the recombination process without resampling (i.e., b = 0), and 
let Zq be fixed. Then, E(Zt) satisfies the differential equation 

^E(Z t )=#(E(Z t )) 
with initial value Zq, and <& from ([6]); therefore, 

E(Zt) = <pt(Z ), forallt>0, 



with (ft from J9]). 
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Proof. Applying Fact(H Eq. (j4]), Lemma [H and the linearity of the expectation, one finds 

a<=L 

= E f t( E K^')) ® (E(T>a-^)) " 

= E f [K«- E (^)) ® K«' E (^)) - 

= #(E(Z t )). 

(Clearly, the third step is the decisive one; it hinges on the independence of the marginals.) 

□ 

Remark 2. Note that Theorem Q] does not hold for the alternate recombination scheme (|22l) . 
This is because the associated resampling effect already destroys the validity of Lemma [H 
which is essential for the proof. We will come back to this in the next paragraph; but let us 
already mention here that the deterministic solution continues to be an excellent approxima- 
tion to the expectation, as long as recombination rates are small. It will also be shown below 
that, in practice, the resampling effect thus introduced is minor as long as recombination rates 
are small, and averages continue to be very close to the deterministic solution. 

Now that the equivalence between the deterministic solution and the expectation has been 
safely established at the level of the type frequencies, it immediately carries over to the linkage 
disequilibria: 

Corollary 1. Under the assumptions of Thm. d one has, for all t > 0, 

E(T G Z t )=T G (<p t (Z )). 
Proof. Use the definitions of T G and Rh, then Remark Q], and finally Theorem [TJ □ 

In particular, if we define 
(26) C t ({ji,...,j k )):=(r { a<ji}U{a>j k } 

(z t )) ((h,...,j k )) 

(so that Ct is the stochastic equivalent of dt of (fT4l) ). we obtain the relation E(Cj ((ji, . . . ,jk))) = 
dt({jli---,jk)), for all cylinder sets (ji,. ..,jk)- 

Fig. [3] illustrates the findings of this paragraph for a four-locus two-allele system with 
recombination and no resampling. We have chosen p = 0.008 and p a = p/3 for a € |, |} = 
L, which corresponds to four sites spaced evenly across a stretch of 8-10 5 bp, at a per-nucleotide 
recombination rate of 10 -8 . The Figure shows haplotype frequencies and highest-order LDEs, 
both as single trajectories and averages over many realisations, as compared to the solution 
of the deterministic recombination equation. For very small populations (N = 100), single 
trajectories fluctuate markedly, but averages over 100 realisations are already indistinguishable 
from the deterministic solution — in line with the property of the expectation just established. 
For larger populations, the stochasticity is already greatly reduced in single trajectories — but 
this is the topic of the next paragraph. 
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The Figure also contains results from the alternate resampling scheme with one offspring 
only, ([20]) - (|22l) . Although Theorem Q] is not exactly valid in this case, the resampling effect 
thus introduced is obviously minor as long as recombination rates are small, and averages 
continue to be very close to the deterministic solution. 

4.3. Recombination and resampling: the infinite population limit. Let us now include 
resampling, at rate 6/2 > 0, and consider the stochastic process \Z\ ' }t>o defined by both 
(fl~6j) and (fl~7l) . where we add the upper index here to indicate the dependence on N. Now, ^ 
and E no longer commute. The processes {ir <a .Z t }t>o and {ir >a .Zt}t>o are still individually 
Markov, but their resampling events are coupled (replacement of y <a by x <a is always tied 
to replacement of y >a by x >a ). Hence the marginal processes fail to be independent, so that 
no equivalent of Lemma [1] holds. 

Let us, therefore, change focus and consider the normalised version {zj: N ^}t>o = {z[ N ^} t >Q/N . 
It seems to be general folklore in population genetics that, in the limit N — > oo, the relative 
frequencies of the Moran or Wright-Fisher model cease to fluctuate and are then given by 
the solution of the corresponding deterministic equation. This is implied no matter which 
evolutionary forces (like mutation, selection, or recombination) are included into the model; 

in our case, {z[ N ^} t >Q should be described by the differential equation ([TJ as N — > oo. How- 
ever, the limit theorem behind this is usually not made explicit, and, in fact, does not seem 
to be well known in the population genetics literature. Indeed, it is given by a very general 
law of large numbers due to Ethier and Kurtz [13j Thm. 11.2.1], which provides the formal 
justification for a very large class of models in biology that are stochastic at the microscopic 
scale but are adequately described deterministically if the population size is large; they range 
from biochemical reaction kinetics to population dynamics and population genetics. For our 
special case, it reads as follows. 

Proposition 1. Consider the family of processes {zj: N ^} t >o = jj{z[ NS} } t >o, N = 1,2,..., 
where {Z[ ■ }t>o is defined by (fTBT) and (fl~7]) . Assume that the initial states are chosen so that 
lini/v^oo Zq N ^ = po. Then, for every given t > 0, one has 

(27) lim sup|Z s (7V) -p s \ = 

with probability 1, where p s := (f s (po) is the solution of the deterministic recombination equa- 
tion ©. 

Proof. To apply Thm. 11.2.1 of [H], we need a linear scaling of the transition rates with N. 

More precisely, we must show that the transition rates Q^ N \z, z + v) of the process {^j }t>o 
are, for all z, z + v € E, of the form 

(28) Q (N \z,z + v) = Nq v (z/N), for all N, 

\x\ 

with non-negative functions q v defined on a subset of M> . Setting 

(29) #(p) := \ £ p(x)p(y), 

(x,y)GXxX: 
s(x,y)=v 

(30) q { ;\p) := \ J2 Q a v{x)p{y), 

(x,y)£XxX,a€L: 
r(x,y,a)=v 



16 



ELLEN BAAKE AND INKE HILDEBRANDT 



together with q v := qi, + q^ , and recalling that 

QW (z, z + v) = (z, z + v) + (z, z + v) 

(from (fT8j) and (fT9l) - now with notational dependence on N), it is obvious that (|28l) is 
indeed satisfied. (Observe that this just reflects the relation z(x)z(y)/N = N^^.) The 
normalised process {Z^} t >o on E/N has the corresponding transitions z/N — > z/N + v/N, 
again at rates Nq v (z/N). Thus, the collection of processes {zW} t>o constitutes what is 
known as a density- dependent family corresponding to the q v [13\ p. 455]. 

Now, for such a density-dependent family, Thm. 11.2.1 of [13j implies ([271) if pt solves 
fit = f(pt), with initial value po, and f(p) := ^2 v vq v (p). Proceeding as in (|21"1) and ([25]) . we 
obtain f(p) = 'Pip) in analogy with Fact El □ 

Note that the convergence in (|27l) applies for any given t, but need not hold for t — > oo; we 
shall come back to this point in the Discussion. Note also that the convergence carries over 
to linkage disequilibria, since Tc(Z t ) converges to Tc(pt) (in the above sense). 

The question that remains is whether this limit result bears any relevance to real popula- 
tions, which are always finite. How large must N be for the infinite-population limit to yield 
a reasonable approximation? 

We have investigated this by means of simulations of our four-locus two-allele system 
(Fig. |4]). As is to be expected, single realisations, as well as averages, approach the de- 
terministic limit with increasing iV; and for N = 10 5 , the stochastic process is already very 
close to the limit. This is observed at the level of type frequencies, as well as linkage disequi- 
libria. The situation is very similar to that without resampling (Fig. [31 (a) and (e)), except 
that somewhat larger population sizes are required due to the increased stochasticity induced 
by resampling. In contrast, averages over small populations are not expected to, and in fact 
do not, get close to the deterministic solution. 

5. Discussion 

The main purpose of this article has been to clarify some relationships between the Moran 
model with recombination and the deterministic recombination model (both with single cross- 
overs). To separate the effects of recombination from those of resampling, we formulated 
the Moran model in its 'decoupled' version, with independent recombination and resampling 
events; this approach is also taken in [21], for example. The coupling of recombination to 
resampling (which happens to be biological reality) is thus neglected. Put differently, our 
model describes correctly the resampling effects due to reproduction events that do not involve 
recombination, but neglects those resampling events that occur in the course of reproduction 
associated with recombination. But recombination is rare (relative to reproduction), at least 
at the molecular level aimed at by the single-crossover model; and the bias introduced by this 
simplification is accordingly small, as also illustrated by our simulations of the alternative 
recombination scheme ([20]) - (l22l . 

As the main result of this article, we have shown: 

(1) For recombination without resampling, the expected type frequencies are given by the 
deterministic dynamics, for arbitrary (even small) population sizes. Although this is 
not a biologically realistic situation, it yields insight into the Moran model with recom- 
bination, and establishes a relationship between finite and infinite populations that is 
somewhat unexpected in view of the inherent nonlinearity of recombination. The key 
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observation that led to this result rests upon the fact that the marginal processes (i.e., 
the type frequencies at the sites before resp. after a given link) are independent Markov 
chains. Of course, this is related to the fact that, in the absence of resampling, the 
genetic material is completely conserved (just rearranged); in particular, types cannot 
go to fixation. 

(2) The combined model with resampling deviates (in expectation) from the deterministic 
dynamics; in view of the findings above, these deviations are solely due to resampling, 
rather than recombination. But the infinite population limit continues to apply, and 
is, again, given by the deterministic recombination equation. We have investigated its 
range of validity here for one representative scenario (four biallelic loci, a time horizon 
of T = 300, an expected number of Tg a = 300 • 0.008/3 = 0.8 recombination events 
per link and individual, and Tb/2 = 300 resampling events per individual); then, a 
population of moderate size (N = 10 5 ) is already close to the deterministic limit. It 
should be noted, however, that this result is expected to further vary with: 

(a) The number of sites, or of alleles per site (or, more generally, the size of the type 
space): The deterministic limit can only be a good approximation if all types 
are well-populated; for a larger type space, the required population size will be 
larger. But situations with four loci (or variable nucleotide sites) already cover 
many interesting biological situations; and typing them (and determining the 
corresponding four-way LDEs) is, after all, a veritable task that yields a great 
deal of information. It remains to be investigated, however, how the quality of 
the approximation changes when there are more than two alleles per site. 

(b) The time horizon: The law of large numbers (I27p holds for every given, finite time 
horizon, but need not carry over to t — > oo. Indeed, if resampling is present, the 
population size required to get close to the deterministic solution is expected to 
grow over all bounds with increasing t. This is because, for every finite N, the 
Moran model with resampling and recombination is an absorbing Markov chain, 
which leads to fixation (i.e., a homogeneous population of uniform type) in finite 
time with probability one (for the special case of just two types without recom- 
bination, the expected time is of order N, if the initial frequencies are both 1/2 
[T4l p. 93]). In sharp contrast, the deterministic system never loses any type, 
and the stationary state, the complete product measure (fT2l) . is, in a sense, even 
the most variable state accessible to the system. For increasing N, finite popu- 
lations stay close to the deterministic limit for an increasing length of time (see 
Fig. [J]). Indeed, our main interest here is this time horizon during which sub- 
stantial changes in LDE occur, and this is described by the deterministic model; 
in contrast, the deterministic solution does not provide a meaningful picture for 
the equilibrium situation. The case would be very different if mutation were in- 
cluded into the model, since this would turn the absorbing Markov chain into an 
ergodic one, whose stationary distribution allows a meaningful comparison with 
the deterministic dynamics even for t — > oo. 

Let us finally discuss implications for the corresponding model in discrete time, that is, the 
Wright-Fisher model with recombination. Again, we may consider 

• a model without resampling (i.e., the only events are single crossovers between pairs 
of individuals, with pairs of offspring replacing pairs of parents): Although one might 
assume this model to behave like the corresponding discrete-time dynamical system for 
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finite N, in expectation, we indeed stipulate that this is not the case. The reason is that, 
in discrete time, the marginal processes (related to non-overlapping stretches) cease to 
be independent. This is because such independence would imply the possibility of two 
or more crossovers in one generation - in contrast to the single-crossover assumption. 
For the same reason, the deterministic single-crossover model in discrete time is much 
more difficult to solve than in continuous time, see the discussion in [6]. 
• recombination combined with resampling: It is strongly expected that, for N — > oo, 
there is again a law of large numbers, which makes the relative frequencies converge 
to the corresponding deterministic dynamics, in analogy with Prop. [TJ However, no 
explicit equivalent of the underlying Theorem 11.2.1 of [13] is known to the authors. 
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(b) haplotype frequency, averaged over 100 runs 
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FIGURE 3. Recombination without resampling in a 4-locus 2-allele system. Sites: S = 
{0,1,2,3}, types: X — {0, l} 4 ; recombination rates: p = 0.008, g 1 , 2 = f? 3 / 2 = 65/2 = p/3; 
initial conditions: Zo(0000) = Zo(llll) = N/2 (no other types are present). Horizontal axis: 



time (t). Panels (a) - (d) show realisations of Zt(0000) for various population sizes, as single 



trajectories (a) or means over 100 (b) realisations; also shown is pt(0000) = (<,5t(^o))(0000) 
(the deterministic solution or N — » 00 limit), which equals E(Z t ) by Thm. [l] (and is partly 



hidden between the diamonds in (b) and (d)l. Clearly, the sample mean is well described 



by the expectation, where averaging is faster in larger populations. Panel (c) shows a single 



trajectory, and panel (d) an average over 100 realisations, for a simulation according to 
Eqs. I|20p - I|22p . in which only one offspring individual is produced, rather than two as in 
Eq. I|17p . the assumption underlying all other simulations. Obviously, this only introduces a 
minor resampling effect and corresponding systematic deviation from the deterministic limit 
(but note that, relative to the usual resampling scheme, the dynamics is slowed down by a 
factor of 1/2). Panels[(e)]and[(f)]show Ct((0, 1, 2, 3)) (of Eq. ((26])), for type 0000, and various 

or means 



population sizes, as single trajectories (e) 



(f) over 100 realisations, and compares 



them with the deterministic quantity (or N — » 00 limit) d t ({0, 1, 2, 3)). 
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FIGURE 4. Recombination with resampling in the 4-locus 2-allele system. Sites, 
types, recombination rates, and initial conditions as in Fig. [3l Resampling rate: 6 = 2. 
Horizontal axis: time (£). Panels (a) and (b) show realisations of Z t (0000) for various 
population sizes, as individual trajectories (a) or averages over 100 realisations |(b)| 
also shown is pt(0000) = (ipt(Zo)) (0000) (the corresponding deterministic solution, or 
N — > oo limit), partly hidden between the N = 10000 stars in |(b)| Single trajectories 
| (a) | approach the infinite population limit already for moderate population sizes (N = 
10 5 ). In contrast, averages | (b) | deviate strongly from the deterministic solution if the 
population is small (error bars on the N — 100 curve correspond to 95 % confidence 
intervals), but are hard to distinguish from the infinite population limit for N = 
10000. Analogous results hold true for single trajectories (c) , and averages over 100 
realisations |(d) | of the highest-order LDE, Ct((0, 1, 2, 3)), evaluated for type 0000; the 
N = oo line is d t - 



